Skip to content

RUM-767: Improve reading-errors telemetry for dropped events#3598

Draft
satween wants to merge 2 commits into
developfrom
tvaleev/feature/RUM-767-dropped-events-rum
Draft

RUM-767: Improve reading-errors telemetry for dropped events#3598
satween wants to merge 2 commits into
developfrom
tvaleev/feature/RUM-767-dropped-events-rum

Conversation

@satween

@satween satween commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Improves internal telemetry for events dropped during file read/write on the persistence layer:

  • Adds TelemetryAttributes (feature.name, event.dropped_bytes) so dropped-event error telemetry carries consistent, queryable attributes across features (RUM, logs, tracing) and across read/write paths.
  • Threads these attributes through RumDataWriter, FileEventBatchWriter, BatchFileReaderWriter/PlainBatchFileReaderWriter, TLVBlock/TLVBlockFileReader, CoreFeature, and SdkFeature so telemetry errors report which feature and how many bytes were involved when an event is dropped.
  • Fixes a detekt violation introduced by the above changes.

Motivation

When events are dropped during persistence (e.g. oversized batches), the existing telemetry lacked enough context to diagnose which feature was affected and how large the dropped payload was. RUM-767 tracks improving this visibility.

Additional Notes

  • New/updated unit tests for TLVBlockFileReader, RumDataWriter, FileEventBatchWriter, BatchFileReaderWriter/PlainBatchFileReaderWriter, TLVBlock.
  • dd-sdk-android-internal public API surface updated (apiSurface / .api files) for the new TelemetryAttributes object.
  • Draft PR — pushed directly per request; full local_ci (tests/lint/apiSurface regen) has not been re-run as part of this push.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Make sure you discussed the feature or bugfix with the maintaining team in an Issue
  • Make sure each commit and the PR mention the Issue number (cf the CONTRIBUTING doc)

@codecov-commenter

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 87.09677% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.78%. Comparing base (0443e2a) to head (cfc0a62).

Files with missing lines Patch % Lines
...rsistence/file/batch/PlainBatchFileReaderWriter.kt 68.75% 5 Missing ⚠️
...tadog/android/rum/internal/domain/RumDataWriter.kt 90.00% 1 Missing and 1 partial ⚠️
...in/com/datadog/android/core/internal/SdkFeature.kt 66.67% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3598      +/-   ##
===========================================
- Coverage    72.85%   72.78%   -0.07%     
===========================================
  Files          975      975              
  Lines        35277    35308      +31     
  Branches      5972     5983      +11     
===========================================
- Hits         25700    25697       -3     
- Misses        7915     7931      +16     
- Partials      1662     1680      +18     
Files with missing lines Coverage Δ
...n/com/datadog/android/core/internal/CoreFeature.kt 88.80% <100.00%> (+0.03%) ⬆️
...d/core/internal/persistence/ConsentAwareStorage.kt 93.98% <100.00%> (+0.07%) ⬆️
.../core/internal/persistence/FileEventBatchWriter.kt 96.49% <100.00%> (+0.26%) ⬆️
...al/persistence/file/batch/BatchFileReaderWriter.kt 100.00% <100.00%> (ø)
...id/core/internal/persistence/tlvformat/TLVBlock.kt 100.00% <100.00%> (ø)
...ternal/persistence/tlvformat/TLVBlockFileReader.kt 100.00% <100.00%> (ø)
...lin/com/datadog/android/rum/internal/RumFeature.kt 92.54% <100.00%> (-0.22%) ⬇️
...in/com/datadog/android/core/internal/SdkFeature.kt 90.00% <66.67%> (+0.05%) ⬆️
...tadog/android/rum/internal/domain/RumDataWriter.kt 93.75% <90.00%> (-6.25%) ⬇️
...rsistence/file/batch/PlainBatchFileReaderWriter.kt 75.83% <68.75%> (-0.03%) ⬇️

... and 40 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@satween

satween commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

@codex review

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves internal telemetry for persistence-layer drops and read/write errors by introducing consistent attribute keys (e.g., feature.name, event.dropped_bytes) and threading feature context through the persistence stack so telemetry is easier to query across features and code paths.

Changes:

  • Added TelemetryAttributes (internal module) and updated API surface files accordingly.
  • Propagated featureName/size context through core persistence components (FileEventBatchWriter, BatchFileReaderWriter/PlainBatchFileReaderWriter, SdkFeature, CoreFeature) and updated unit tests.
  • Added a max-item-size drop path in RumDataWriter with telemetry/user logging and expanded RUM unit tests for size-exceeded scenarios.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
features/dd-sdk-android-rum/src/test/kotlin/com/datadog/android/rum/internal/domain/RumDataWriterTest.kt Adds unit coverage for RUM events dropped when serialized payload exceeds maxItemSize.
features/dd-sdk-android-rum/src/main/kotlin/com/datadog/android/rum/internal/RumFeature.kt Wires storageConfiguration.maxItemSize into RumDataWriter.
features/dd-sdk-android-rum/src/main/kotlin/com/datadog/android/rum/internal/domain/RumDataWriter.kt Adds size-limit drop handling with user+telemetry logging and dropped-bytes attribute.
dd-sdk-android-internal/src/main/java/com/datadog/android/internal/telemetry/TelemetryAttributes.kt Introduces shared telemetry attribute keys for persistence diagnostics.
dd-sdk-android-internal/api/dd-sdk-android-internal.api API dump updated for the new TelemetryAttributes public type.
dd-sdk-android-internal/api/apiSurface API surface updated to include TelemetryAttributes.
dd-sdk-android-core/src/test/kotlin/com/datadog/android/core/internal/persistence/tlvformat/TLVBlockTest.kt Updates expectations for TLV serialization oversize logging targets/level.
dd-sdk-android-core/src/test/kotlin/com/datadog/android/core/internal/persistence/tlvformat/TLVBlockFileReaderTest.kt Adds/updates tests for invalid TLV declared length handling and logging.
dd-sdk-android-core/src/test/kotlin/com/datadog/android/core/internal/persistence/FileEventBatchWriterTest.kt Updates tests for new telemetry targets and added dropped-bytes/feature attributes.
dd-sdk-android-core/src/test/kotlin/com/datadog/android/core/internal/persistence/file/batch/PlainBatchFileReaderWriterTest.kt Updates tests for new constructor signature and feature-name telemetry attributes.
dd-sdk-android-core/src/test/kotlin/com/datadog/android/core/internal/persistence/file/batch/BatchFileReaderWriterTest.kt Updates tests for create(..., featureName) signature.
dd-sdk-android-core/src/test/kotlin/com/datadog/android/core/internal/CoreFeatureTest.kt Updates usage of BatchFileReaderWriter.create to include feature name.
dd-sdk-android-core/src/main/kotlin/com/datadog/android/core/internal/SdkFeature.kt Threads featureName into persistence reader/writer creation.
dd-sdk-android-core/src/main/kotlin/com/datadog/android/core/internal/persistence/tlvformat/TLVBlockFileReader.kt Adds declared-length validation with telemetry/user error logging.
dd-sdk-android-core/src/main/kotlin/com/datadog/android/core/internal/persistence/tlvformat/TLVBlock.kt Changes oversize TLV serialization log to ERROR and USER+TELEMETRY; exposes size constant internally.
dd-sdk-android-core/src/main/kotlin/com/datadog/android/core/internal/persistence/FileEventBatchWriter.kt Adds featureName and attaches feature.name + event.dropped_bytes on oversize drops.
dd-sdk-android-core/src/main/kotlin/com/datadog/android/core/internal/persistence/file/batch/PlainBatchFileReaderWriter.kt Adds featureName and attaches it to read-side telemetry errors/warnings.
dd-sdk-android-core/src/main/kotlin/com/datadog/android/core/internal/persistence/file/batch/BatchFileReaderWriter.kt Extends factory to require featureName and passes it down.
dd-sdk-android-core/src/main/kotlin/com/datadog/android/core/internal/persistence/ConsentAwareStorage.kt Passes featureName into FileEventBatchWriter.
dd-sdk-android-core/src/main/kotlin/com/datadog/android/core/internal/CoreFeature.kt Ensures core-created batch reader/writers include Feature.RUM_FEATURE_NAME.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +63 to +66
additionalProperties = mapOf(
RUM_EVENT_TYPE to rumEventType,
TelemetryAttributes.EVENT_DROPPED_BYTES to byteArray.size
)
}

@Test
fun `M return empty array and warn W read() { declared block length exceeds limit }`(
}

@Test
fun `M return empty array and warn W read() { declared block length is negative }`(

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cfc0a6226a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +63 to +66
additionalProperties = mapOf(
RUM_EVENT_TYPE to rumEventType,
TelemetryAttributes.EVENT_DROPPED_BYTES to byteArray.size
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Add feature name to RUM drop telemetry

When a serialized RUM event exceeds maxItemSize, this branch returns before FileEventBatchWriter.write() runs, so the telemetry emitted here is the only drop signal for RUM. Unlike the shared storage path inspected in FileEventBatchWriter, this map omits TelemetryAttributes.FEATURE_NAME, so dashboards or alerts querying feature.name will miss RUM oversized-event drops even though logs/traces include that attribute.

Useful? React with 👍 / 👎.

level = InternalLogger.Level.ERROR,
targets = listOf(InternalLogger.Target.USER, InternalLogger.Target.TELEMETRY),
messageBuilder = { WARNING_NOT_ALL_DATA_READ.format(Locale.US, file.path) },
additionalProperties = mapOf(TelemetryAttributes.FEATURE_NAME to featureName)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Report dropped bytes for corrupted batch reads

When a batch file is truncated or corrupt and this branch logs WARNING_NOT_ALL_DATA_READ, the unread bytes represented by remaining are the data that will be lost once the partially read batch is confirmed/deleted. The new write-side oversized-event telemetry includes event.dropped_bytes, but this read-side drop signal only has feature.name, so read corruption remains unqueryable by loss size despite the value being available here.

Useful? React with 👍 / 👎.

Comment on lines +126 to +129
targets = listOf(InternalLogger.Target.USER, InternalLogger.Target.TELEMETRY),
messageBuilder = {
CORRUPT_DATA_LENGTH_ERROR.format(Locale.US, maxEntrySize, lengthData)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include feature name on TLV datastore telemetry

When a feature's DataStore file has an invalid declared TLV length, this new telemetry error is emitted without TelemetryAttributes.FEATURE_NAME, even though SdkFeature.prepareDataStoreHandler constructs the reader per wrappedFeature.name. As a result, corrupt datastore-read drops cannot be attributed to RUM/logs/tracing/session-replay in telemetry queries, unlike the batch read/write telemetry added in this change.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants